In the previous Section we discussed a fundamental issue associated with the direction of the negative gradient: it can - depending on the function being minimized - oscillate rapidly, leading to zig-zagging gradient descent steps that slow down minimization. In this Section we describe a popular enhancement to the standard gradient descent step, called momentum accelerated gradient descent, that is specifically designed to ameliorate this issue. The core of this idea comes from the field of time series analysis, and in particular is a tool for smoothing time series known as the exponential average. Here we first introduce the exponential average and then detail how it can be integrated into the standard gradient descent step in order to help ameliorate some of this zig-zagging (when it occurs) and consequently speeding up gradient descent.
In the figure below we show an example of a time series dataset. This particular example comes from a real snippet of a financial time series - a history of the price of a financial stock over 500 periods of time. However time series datasets abound in science, engineering, and buisness (and are a subject of particular study in machine learning, as we discuss in a future Chapter). More specifically, a time series dataset consists of a sequence of $K$ ordered points $x^1,\,x^2,\,...,x^K$. Note: These points are assumed ordered, meaning that the point $x^1$ comes before (that is, it is created and / or collected before) $x^2$, the point $x^2$ before $x^3$, and so on. For example we generate a time series of points whenever we run a local optimization scheme with steps
\begin{equation} \mathbf{w}^k = \mathbf{w}^{k-1} + \alpha \mathbf{d}^{k-1} \end{equation}which produces the time series sequence of ordered points $\mathbf{w}^{1},\,\mathbf{w}^{2},\,...,\mathbf{w}^{K}$ that are potentially multi-dimensional.
In any case, because the raw values of a time series often oscillates or zig-zags up and down it is common practice to smooth them to remove these zig-zagging motions prior to further analysis. The exponential average is one of the most popular such smoothing techniques for time series, and is used in virtually every application in which time series arise. We show the result of smoothing the example time series below via the exponential average - with the resulting exponential average shown as a pink curve. Note that the first average point - set to be the first point of the input series $x_1$ - is shown as a pink dot.